I am modeling the distributions of the four Enyalius study species (E. catenatus, E. pictus, E. perditus, and E. iheringii) to understand their current distributions and responses to long-term historical climate change.

I will build robust models of their current distributions using recent environmental data (i.e. last few decades). Then, I will build similar models with variables that are available for both present-day and historical (i.e. last glacial maximum) time periods. I will use the current models as benchmarks for the performance of the models intended for hindcasting.

library(sf)
library(spThin)
library(janitor)
library(here)
library(rasterVis)
library(tidyverse)
library(ENMeval)
library(patchwork)
library(terra)

Current

Variable selection

I’m choosing variables that encompass the range of the Enyalius species, are relevant to their physiology, and are obtainable at a reasonable spatial resolution.

Bioclims:
Using the BrazilClim dataset downloaded from Ramoni-Perazzi, P., Passamani, M., Thielen, D., Padovani, C., Arizapana-Almonacid, M.A., 2021. BrazilClim : The overcoming of limitations of pre‐existing bioclimate data. Int. J. Climatol. https://doi.org/10.1002/joc.7325. Downloaded on 2022-02-25.
1 km resolution.

Land Cover:
Using the forest land cover rasters (classes 1-4) from Tuanmu, M.-N. and W. Jetz. 2014. A global 1-km consensus land-cover product for biodiversity and ecosystem modeling. Global Ecology and Biogeography 23(9): 1031-1045. Downloaded on 2022-02-26.
1 km resolution

Variable processing

I’m cropping the variables to a relevant extent before going further with the individual SDMs. I’m cropping them with a 1 degree buffer around the entire species group’s range.

First, I’m reading in the localities and plotting them to make sure there isn’t anything wild. Everything looks reasonable!

# read in as a data frame first
locs_df <- read_csv(here("analysis", "data", "enyalius_locs.csv")) %>% 
  # the variable names are messy
  janitor::clean_names() %>%
  filter(!is.na(latitude), !is.na(longitude),
         # remove localities that are only mapped to a Google Earth municipality. The reserve in the "GPS of the reserve" locality is barely a km, so that is acceptable error relative to the environmental resolution
         source_lat_long != "Google Earth municipality") %>%
  # there are some lat/longs with spaces. Need to fix this
  mutate(longitude = str_remove(longitude, "\\s"), 
         latitude = str_remove(latitude, "\\s")) %>% 
  # convert to numeric
  mutate(longitude = as.numeric(longitude),
         latitude = as.numeric(latitude))

# convert to sf for spatial operations
locs_sf <- st_as_sf(locs_df, 
                    coords = c("longitude", "latitude"),
                    crs = 4326)

# convert to vect for interacting with terra
locs_vec <- vect(locs_sf)

# atlantic forest shapefile for plotting
af <- read_sf(here("analysis", "data", "atlantic_forest", "atlantic_forest.geojson"))

# plot
ggplot() +
  geom_sf(data = st_geometry(af)) +
  geom_sf(data = locs_sf)

Creating a convex hull with a 1 degree buffer.

mcp_all <- st_convex_hull(st_union(locs_sf)) %>%
  st_buffer(dist = 1.0) %>% 
  terra::vect()

plot(st_geometry(af))
plot(mcp_all, add=TRUE)
plot(locs_vec, add=TRUE)

Crop bioclims.

bioclims <- terra::rast(list.files(here("analysis", "data", "brazil_clim"), full.names = TRUE)) %>% 
  terra::crop(mcp_all) %>% 
  terra::mask(mcp_all)

# fix the names  
names(bioclims) <- str_remove(names(bioclims), "BC1.0_")


plot(bioclims$bio_01)
plot(st_geometry(af), add=TRUE)
plot(mcp_all, add=TRUE)
plot(locs_vec, add=TRUE)

Crop forest cover. The class 01 variable, needleleaf forest, isn’t present in the AF, so that’s why you don’t see it here.

forest_cover <- terra::rast(list.files(here("analysis", "data", "forest_cover"), full.names = TRUE)) %>% 
  terra::crop(mcp_all) %>% 
  terra::mask(mcp_all) 
# fix the names  
names(forest_cover) <- paste0("fc_", names(forest_cover)) %>% str_replace("\\-", "\\_")

plot(forest_cover$fc_04_mixed)
plot(st_geometry(af), add=TRUE)
plot(mcp_all, add=TRUE)
plot(locs_vec, add=TRUE)

Write the cropped climate and forest cover layers to file for use in the SDMs.

terra::writeRaster(bioclims, here("analysis", "output", "cropped_predictors", "bioclims.tif"))
terra::writeRaster(forest_cover, here("analysis", "output", "cropped_predictors", "forest_cover.tif"))

General SDM steps

For each SDM, I’m going to perform the following steps:

  • spatially thin localities
  • crop the new predictors according to a minimum convex hull of the localities for each species with a 0.5 degree buffer
  • extract background environmental data (10,000 points) for predictor correlation and modeling
  • correlate predictors
  • remove predictors w/ r > 0.75, prioritizing ecologically relevant variables
  • Use Maxent for modeling
    • LQH feature classes to keep models relatively simple
    • regularization multipliers from 0.5 to 5.0 in 0.5 increments to test a wide range of regularization
    • leave-one-out cross validation for model selection due to a low number of localities
    • select models first by AICc (model fit), followed by ommission error rate (prediction)

In addition, I’m reading/writing data for each species in isolation, so the code for one species isn’t dependent on that for another, or for what I did during variable processing.

E. catenatus

Reading in data for plotting.

# atlantic forest shapefile
af <- read_sf(here("analysis", "data", "atlantic_forest", "atlantic_forest.geojson"))

Spatial thin

Read and filter localities.

locs_cat <- read_csv(here("analysis", "data", "enyalius_locs.csv")) %>% 
  # the variable names are messy
  janitor::clean_names() %>%
  filter(species == "catenatus",
    !is.na(latitude), !is.na(longitude),
    # remove duplicates
    !duplicated(latitude),
         # remove localities that are only mapped to a Google Earth municipality. The reserve in the "GPS of the reserve" locality is barely a km, so that is acceptable error relative to the environmental resolution
         source_lat_long != "Google Earth municipality") %>%
  # there are some lat/longs with spaces. Need to fix this
  mutate(longitude = str_remove(longitude, "\\s"), 
         latitude = str_remove(latitude, "\\s")) %>% 
  # convert to numeric
  mutate(longitude = as.numeric(longitude),
         latitude = as.numeric(latitude)) %>% 
  # convert to sf for spatial operations
  st_as_sf(coords = c("longitude", "latitude"),
           crs = 4326,
           remove = FALSE)
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_character(),
##   `ng/ul` = col_double(),
##   Longitude = col_double(),
##   Reads = col_double()
## )
## ℹ Use `spec()` for the full column specifications.
# plot localities
plot(st_geometry(af))
plot(st_geometry(locs_cat), add = TRUE)

Spatial thin. I’m using a 5 km buffer

set.seed(29833)

#run spthin algorithm. This returns 100 possible combinations of removed localities
output <-
  spThin::thin(
    locs_cat,
    'latitude',
    'longitude',
    'species',
    thin.par = 5,
    reps = 100,
    locs.thinned.list.return = TRUE,
    write.files = FALSE,
    verbose = FALSE
  )

# I want to maximize the # of localities returned  
maxThin <- which(sapply(output, nrow) == max(sapply(output, nrow)))

# if there are multiple iterations with the max # of localities, pick one
maxThin <-
  output[[ifelse(length(maxThin) > 1, maxThin[1], maxThin)]]

# subset locs to match only thinned locs
locs_cat <- locs_cat[as.numeric(rownames(maxThin)), ]

# get the unused locs as a testing data set
# this probably isn't useful since they will overlap heavily with the training data, but seems like another piece of info to look at
test_cat <- locs_cat[-as.numeric(rownames(maxThin)), ]

plot(st_geometry(af))
plot(st_geometry(locs_cat), add=TRUE)

Write thinned localities to file

# Write to file
st_write(locs_cat, here("analysis", "output", "thinned_localities", "catenatus_thinned.geojson"),
         delete_dsn = TRUE)

Crop environment

First, I need to read in the environmental data.

bioclims <- terra::rast(here("analysis", "output", "cropped_predictors", "bioclims.tif"))
forest_cover <- terra::rast(here("analysis", "output", "cropped_predictors", "forest_cover.tif"))

Cropping and combining variables for analysis.

mcp_cat <- st_convex_hull(st_union(locs_cat)) %>%
  st_buffer(dist = 0.5) %>% 
  terra::vect()
## although coordinates are longitude/latitude, st_union assumes that they are planar
## Warning in st_buffer.sfc(., dist = 0.5): st_buffer does not correctly buffer
## longitude/latitude data
## dist is assumed to be in decimal degrees (arc_degrees).
bioclims_cat <- bioclims %>%
  # the bioclims are in a slightly different CRS
  terra::project(forest_cover) %>% 
  terra::crop(mcp_cat) %>% 
  terra::mask(mcp_cat)

forest_cover_cat <- forest_cover %>% 
  terra::crop(bioclims_cat) %>% 
  terra::mask(bioclims_cat[[1:3]]) 

plot(bioclims_cat[[1]])

plot(forest_cover_cat[[1]])

predictors_cat <- c(bioclims_cat, forest_cover_cat)

Predictor correlation

Sample 10000 background points.

set.seed(19874)

# for variable correlations
bg_envt_cat <- terra::spatSample(predictors_cat, 10000, 
                               warn=TRUE, 
                               na.rm = TRUE, 
                               as.df = TRUE,
                               xy = TRUE)

# for use in ENMevaluate
bg_coords_cat <- bg_envt_cat[,c("x", "y")]

Next, I’ll extract the values for the localities and calculate the correlations among variables.

# extract values
bg_corr_cat <- bg_envt_cat %>% select(-x, -y)

# correlation dataframe
corr_df_cat <- corrr::correlate(bg_corr_cat, method = "pearson") 

# only retain correlations >= 0.75 for visualizing and deciding which to remove
corr_df_75_cat <- corr_df_cat %>% 
  # replace NAs first
  mutate(across(everything(), 
                ~replace_na(., 0)),
         # take absolute value of all values to make filtering easier. I'm not interested in the direction 
         across(where(is.numeric), abs)) %>% 
  # only keep values 
  mutate(across(everything(),
                ~ ifelse(. < 0.75, 0, .)))

corrr::rplot(corr_df_75_cat)

I’m prioritizing measures of climate extremes and environmental variability over averages to incorporate information about ecological limits rather than averages. In terms of bioclims, this comes out to BIO4 (temperature seasonality), BIO5 (max temp of the warmest month), BIO6 (min temp of the coldest month), BIO7 (annual temp range), BIO13 (prec of the wettest month), BIO14 (prec of the driest month), BIO15 (prec seasonality).
None of the forest cover variables are correlated with anything else.

Variable: what the variable is correlated with.
BIO4: Nothing
BIO5: BIO1, BIO9, BIO10, BIO11
BIO6: BIO1, BIO9, BIO10, BIO11
BIO7: BIO2 BIO13: BIO12, BIO16 BIO14: BIO12, BIO17, BIO19 BIO15: Nothing

The final variable list: BIO4, BIO5, BIO6, BIO7, BIO13, BIO14, BIO15, forest cover (3 vars)

predictors_cat <- predictors_cat[[c("bio_04", "bio_05", "bio_06", "bio_07", "bio_13", "bio_14", "bio_15", "fc_02_evergreen_broadleaf", "fc_03_deciduous_broadleaf", "fc_04_mixed")]]

Maxent model

I’m using a jackknifing model evaluation approach since I only have 26 observations and a previous attempt for spatial CV led to wonky evaluation metrics.

set.seed(817)
coords_cat <- st_coordinates(locs_cat)
colnames(coords_cat) <- c("x", "y")

folds_cat <- ENMeval::get.jackknife(occ = coords_cat, 
                            bg = bg_coords_cat)

Run the model. Predictions are clamped to prevent extrapolation. This part takes a little bit. 29 minutes 57 sec on the wall clock on a 2020 M1 MacBook air with 6 cores and 16 GB of memory. Probably because of the high number of background points and the wide range of regularization multipliers I’m exploring.

set.seed(832)

# the vector of regularization multipliers to test
rms <- seq(0.5, 5, 0.5)

# convert the terra object to a raster stack for use in EMNeval
predictors_cat_stack <- raster::stack(predictors_cat)

# iterate model building over all chosen parameter settings
sdm_cat <-
  ENMeval::ENMevaluate(
    occs = coords_cat,
    envs = predictors_cat_stack,
    bg.coords = bg_coords_cat,
    RMvalues = rms,
    fc = c('L', 'LQ', 'H', 'LQH'),
    # clamping to prevent model extrapolation
    doClamp = TRUE,
    taxon.name = "catenatus",
    partitions = "user",
    # going to do a separate run with the final model on testing data
    #occs.testing = test_cat,
    user.grp = folds_cat,
    bg.grp,
    clamp = TRUE,
    algorithm = "maxnet",
    parallel = TRUE,
    numCores = 6
  )
# write the model to file
write_rds(sdm_cat, here("analysis", "output", "sdm_models", "sdm_catenatus.rds"))
Model evaluation

Let’s take a look at the model results.

eval_table_cat <- sdm_cat@results
eval_mods_cat <- sdm_cat@models

names(eval_mods_cat) <-
  str_replace_all(names(eval_mods_cat), "\\.", "\\_")

Select the final model. First I’m looking at plots of model evaluation stats to get an idea of the spread of stats.

daic_cat <- ggplot(data = eval_table_cat, aes(x = rm, y = delta.AICc, color = fc)) +
  geom_point() +
  scale_color_viridis_d() +
  theme_bw()

or_cat <- ggplot(data = eval_table_cat, aes(x = rm, y = or.10p.avg, color = fc)) +
  geom_point() +
  scale_color_viridis_d() +
  theme_bw()

dauc_cat <- ggplot(data = eval_table_cat, aes(x = rm, y = auc.diff.avg, color = fc)) +
  geom_point() +
  scale_color_viridis_d() +
  theme_bw()

cbi_cat <- ggplot(data = eval_table_cat, aes(x = rm, y = cbi.val.avg, color = fc)) +
  geom_point() +
  scale_color_viridis_d() +
  theme_bw()

(daic_cat + or_cat) / (dauc_cat + cbi_cat)

Now I’m going to take a look at tables of delta AICc, omission rate, and AUC to see how close the models are. The model with the lowest AICc has a regularization multiplier of 4.5 and LQH feature classes. It also has a reasonable omission rate and AUC. I will go forward with this model.

eval_table_cat %>% 
  select(delta.AICc, AICc, or.10p.avg, auc.diff.avg, auc.val.avg, rm, fc) %>%
  arrange(delta.AICc) %>% 
  head(10) %>% 
  knitr::kable()
delta.AICc AICc or.10p.avg auc.diff.avg auc.val.avg rm fc
0.0000000 580.0415 0.1538462 0.0654867 0.8935250 4.5 LQH
0.2667773 580.3082 0.1923077 0.0633875 0.8907154 3 LQH
0.5665157 580.6080 0.1538462 0.0644193 0.8940500 5 LQH
2.2056193 582.2471 0.1538462 0.0644665 0.8912673 4 LQH
4.2386316 584.2801 0.1923077 0.0619812 0.8882731 3.5 LQH
4.3121942 584.3537 0.2307692 0.0676215 0.8894327 3.5 H
5.4844432 585.5259 0.2692308 0.0638505 0.8894404 2.5 LQH
7.2329204 587.2744 0.2692308 0.0685320 0.8921904 2 LQH
7.4221034 587.4636 0.2307692 0.0689630 0.8927596 5 H
8.0757581 588.1172 0.1538462 0.0768823 0.8899327 5 LQ

Select model.

mod_cat <- eval_mods_cat$rm_4_5_fc_LQH
opt_seq_tune_cat <- eval_table_cat$tune.args[na.omit(eval_table_cat$delta.AICc) == 0]

Plot the response curves.
Only fc_02 and fc_04 have any effect on occurrence. Interesting! Suitability increases with evergreen broadleaf and decreases with mixed forest.

#plot(mod_cat, vars = c("fc_02_evergreen_broadleaf", "fc_04_mixed"), type = "cloglog")
Project

I’m projecting the model to the study area extent.

pred_cat <- ENMeval::eval.predictions(sdm_cat)[[opt_seq_tune_cat]]
plot(pred_cat)
plot(st_geometry(af), add = TRUE)
plot(st_geometry(locs_cat), pch = 21, bg = alpha("lightgray", 0.5), add = TRUE)

E. pictus

Reading in data for plotting.

# atlantic forest shapefile
af <- read_sf(here("analysis", "data", "atlantic_forest", "atlantic_forest.geojson"))

Spatial thin

Read and filter localities.

locs_pic <- read_csv(here("analysis", "data", "enyalius_locs.csv")) %>% 
  # the variable names are messy
  janitor::clean_names() %>%
  filter(species == "pictus",
    !is.na(latitude), !is.na(longitude),
    # remove duplicates
    !duplicated(latitude),
         # remove localities that are only mapped to a Google Earth municipality. The reserve in the "GPS of the reserve" locality is barely a km, so that is acceptable error relative to the environmental resolution
         source_lat_long != "Google Earth municipality") %>%
  # there are some lat/longs with spaces. Need to fix this
  mutate(longitude = str_remove(longitude, "\\s"), 
         latitude = str_remove(latitude, "\\s")) %>% 
  # convert to numeric
  mutate(longitude = as.numeric(longitude),
         latitude = as.numeric(latitude)) %>% 
  # convert to sf for spatial operations
  st_as_sf(coords = c("longitude", "latitude"),
           crs = 4326,
           remove = FALSE)
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_character(),
##   `ng/ul` = col_double(),
##   Longitude = col_double(),
##   Reads = col_double()
## )
## ℹ Use `spec()` for the full column specifications.
# plot localities
plot(st_geometry(af))
plot(st_geometry(locs_pic), add = TRUE)

Spatial thin. I’m using a 5 km buffer

set.seed(1873)

#run spthin algorithm. This returns 100 possible combinations of removed localities
output <-
  spThin::thin(
    locs_pic,
    'latitude',
    'longitude',
    'species',
    thin.par = 5,
    reps = 100,
    locs.thinned.list.return = TRUE,
    write.files = FALSE,
    verbose = FALSE
  )

# I want to maximize the # of localities returned  
maxThin <- which(sapply(output, nrow) == max(sapply(output, nrow)))

# if there are multiple iterations with the max # of localities, pick one
maxThin <-
  output[[ifelse(length(maxThin) > 1, maxThin[1], maxThin)]]

# subset locs to match only thinned locs
locs_pic <- locs_pic[as.numeric(rownames(maxThin)), ]

# get the unused locs as a testing data set
# this probably isn't useful since they will overlap heavily with the training data, but seems like another piece of info to look at
test_pic <- locs_pic[-as.numeric(rownames(maxThin)), ]

plot(st_geometry(af))
plot(st_geometry(locs_pic), add=TRUE)

Write thinned localities to file

# Write to file
st_write(locs_pic, here("analysis", "output", "thinned_localities", "pictus_thinned.geojson"),
         delete_dsn = TRUE)

Crop environment

First, I need to read in the environmental data.

bioclims <- terra::rast(here("analysis", "output", "cropped_predictors", "bioclims.tif"))
forest_cover <- terra::rast(here("analysis", "output", "cropped_predictors", "forest_cover.tif"))

Cropping and combining variables for analysis.

mcp_pic <- st_convex_hull(st_union(locs_pic)) %>%
  st_buffer(dist = 0.5) %>% 
  terra::vect()
## although coordinates are longitude/latitude, st_union assumes that they are planar
## Warning in st_buffer.sfc(., dist = 0.5): st_buffer does not correctly buffer
## longitude/latitude data
## dist is assumed to be in decimal degrees (arc_degrees).
bioclims_pic <- bioclims %>%
  # the bioclims are in a slightly different CRS
  terra::project(forest_cover) %>% 
  terra::crop(mcp_pic) %>% 
  terra::mask(mcp_pic)

forest_cover_pic <- forest_cover %>% 
  terra::crop(bioclims_pic) %>% 
  terra::mask(bioclims_pic[[1:3]]) 

plot(bioclims_pic[[1]])
plot(st_geometry(locs_pic), add = TRUE)

plot(forest_cover_pic[[1]])
plot(st_geometry(locs_pic), add = TRUE)

predictors_pic <- c(bioclims_pic, forest_cover_pic)

Predictor correlation

Sample 10000 background points.

set.seed(888874)

# for variable correlations
bg_envt_pic <- terra::spatSample(predictors_pic, 10000, 
                               warn=TRUE, 
                               na.rm = TRUE, 
                               as.df = TRUE,
                               xy = TRUE)

# for use in ENMevaluate
bg_coords_pic <- bg_envt_pic[,c("x", "y")]

Next, I’ll extract the values for the localities and calculate the correlations among variables.

# extract values
bg_corr_pic <- bg_envt_pic %>% select(-x, -y)

# correlation dataframe
corr_df_pic <- corrr::correlate(bg_corr_pic, method = "pearson") 
## 
## Correlation method: 'pearson'
## Missing treated using: 'pairwise.complete.obs'
# only retain correlations >= 0.75 for visualizing and deciding which to remove
corr_df_75_pic <- corr_df_pic %>% 
  # replace NAs first
  mutate(across(everything(), 
                ~replace_na(., 0)),
         # take absolute value of all values to make filtering easier. I'm not interested in the direction 
         across(where(is.numeric), abs)) %>% 
  # only keep values 
  mutate(across(everything(),
                ~ ifelse(. < 0.75, 0, .)))

corrr::rplot(corr_df_75_pic)
## Don't know how to automatically pick scale for object of type noquote. Defaulting to continuous.

I’m prioritizing measures of climate extremes and environmental variability over averages to incorporate information about ecological limits rather than averages. In terms of bioclims, this comes out to BIO4 (temperature seasonality), BIO5 (max temp of the warmest month), BIO6 (min temp of the coldest month), BIO7 (annual temp range), BIO13 (prec of the wettest month), BIO14 (prec of the driest month), BIO15 (prec seasonality).
None of the forest cover variables are correlated with anything else.

Variable: what the variable is correlated with.
BIO4: Nothing
BIO5: BIO1, BIO9, BIO10, BIO11
BIO6: BIO1, BIO8, BIO10, BIO11
BIO7: BIO2, BIO15
BIO13: BIO16
BIO14: BIO12, BIO12, BIO15, BIO17, BIO19 X BIO15: BIO2, BIO7, BIO14, BIO17, BIO19

BIO15 (prec seasonality) is a tough variable to model and it’s correlated with two other more reliable variables, so I’m omitting it from consideration.

The final variable list: BIO4, BIO5, BIO6, BIO7, BIO13, BIO14, forest cover (3 vars)

predictors_pic <- predictors_pic[[c("bio_04", "bio_05", "bio_06", "bio_07", "bio_13", "bio_14",  "fc_02_evergreen_broadleaf", "fc_03_deciduous_broadleaf", "fc_04_mixed")]]

Maxent model

I’m using a jackknifing model evaluation approach since I only have 15 observations.

set.seed(77717)
coords_pic <- st_coordinates(locs_pic)
colnames(coords_pic) <- c("x", "y")

folds_pic <- ENMeval::get.jackknife(occ = coords_pic, 
                            bg = bg_coords_pic)

Run the model. Predictions are clamped to prevent extrapolation. This part takes a little bit. 29 minutes 57 sec on the wall clock on a 2020 M1 MacBook air with 6 cores and 16 GB of memory. Probably because of the high number of background points and the wide range of regularization multipliers I’m exploring.

set.seed(839767)

# the vector of regularization multipliers to test
rms <- seq(0.5, 5, 0.5)

# convert the terra object to a raster stack for use in EMNeval
predictors_pic_stack <- raster::stack(predictors_pic)

# iterate model building over all chosen parameter settings
sdm_pic <-
  ENMeval::ENMevaluate(
    occs = coords_pic,
    envs = predictors_pic_stack,
    bg.coords = bg_coords_pic,
    RMvalues = rms,
    fc = c('L', 'LQ', 'H', 'LQH'),
    # clamping to prevent model extrapolation
    doClamp = TRUE,
    taxon.name = "pictus",
    partitions = "user",
    # going to do a separate run with the final model on testing data
    #occs.testing = test_pic,
    user.grp = folds_pic,
    bg.grp,
    clamp = TRUE,
    algorithm = "maxent.jar",
    parallel = TRUE,
    numCores = 6
  )
# write the model to file
write_rds(sdm_pic, here("analysis", "output", "sdm_models", "sdm_picta.rds"))
Model evaluation

Let’s take a look at the model results.

eval_table_pic <- sdm_pic@results
eval_mods_pic <- sdm_pic@models

names(eval_mods_pic) <-
  str_replace_all(names(eval_mods_pic), "\\.", "\\_")

Select the final model. First I’m looking at plots of model evaluation stats to get an idea of the spread of stats.

daic_pic <- ggplot(data = eval_table_pic, aes(x = rm, y = delta.AICc, color = fc)) +
  geom_point() +
  scale_color_viridis_d() +
  theme_bw()

or_pic <- ggplot(data = eval_table_pic, aes(x = rm, y = or.10p.avg, color = fc)) +
  geom_point() +
  scale_color_viridis_d() +
  theme_bw()

dauc_pic <- ggplot(data = eval_table_pic, aes(x = rm, y = auc.diff.avg, color = fc)) +
  geom_point() +
  scale_color_viridis_d() +
  theme_bw()

cbi_pic <- ggplot(data = eval_table_pic, aes(x = rm, y = cbi.val.avg, color = fc)) +
  geom_point() +
  scale_color_viridis_d() +
  theme_bw()

(daic_pic + or_pic) / (dauc_pic + cbi_pic)
## Warning: Removed 16 rows containing missing values (geom_point).
## Warning: Removed 39 rows containing missing values (geom_point).

Now I’m going to take a look at tables of delta AICc, omission rate, and AUC to see how close the models are. The model with the lowest AICc has a regularization multiplier of 1 and L feature class. It also has a reasonable omission rate and AUC. I will go forward with this model.

eval_table_pic %>% 
  select(delta.AICc, AICc, or.10p.avg, auc.diff.avg, auc.val.avg, rm, fc) %>%
  arrange(delta.AICc) %>% 
  head(10) %>% 
  knitr::kable()
delta.AICc AICc or.10p.avg auc.diff.avg auc.val.avg rm fc
0.000000 360.8966 0.2000000 0.1767605 0.8308133 1 L
3.202613 364.0992 0.1333333 0.2340286 0.7917133 2 L
3.343612 364.2402 0.2000000 0.1736979 0.8298133 1 LQ
3.354566 364.2511 0.1333333 0.2190329 0.7917933 1.5 L
3.874711 364.7713 0.1333333 0.2281050 0.7964267 3 LQ
4.865598 365.7622 0.2000000 0.1900395 0.8124133 1.5 LQ
5.519450 366.4160 0.1333333 0.2357205 0.7880600 2.5 L
5.734405 366.6310 0.1333333 0.2302443 0.7934133 3.5 LQ
6.113226 367.0098 0.1333333 0.2270914 0.7955733 2.5 LQ
7.375265 368.2718 0.2666667 0.2149500 0.7920933 2 LQ

Select model.

mod_pic <- eval_mods_pic$rm_1_fc_L
opt_seq_tune_pic <- eval_table_pic$tune.args[na.omit(eval_table_pic$delta.AICc) == 0][1]

Variable importance. It looks like forests are important again! In addition to precipitation.

plot(mod_pic)

Plot the response curves. BIO14, fc_02, amd fc_03 are positively related with presence, while fc_04 is weakly negatively related

dismo::response(mod_pic)
## Loading required namespace: rJava

Project

I’m projecting the model to the study area extent.

pred_pic <- ENMeval::eval.predictions(sdm_pic)[[opt_seq_tune_pic]]
plot(pred_pic)
plot(st_geometry(af), add = TRUE)
plot(st_geometry(locs_pic), pch = 21, bg = alpha("lightgray", 0.5), add = TRUE)

E. perditus

Reading in data for plotting.

# atlantic forest shapefile
af <- read_sf(here("analysis", "data", "atlantic_forest", "atlantic_forest.geojson"))

Spatial thin

Read and filter localities.

locs_per <- read_csv(here("analysis", "data", "enyalius_locs.csv")) %>% 
  # the variable names are messy
  janitor::clean_names() %>%
  filter(species == "perditus",
    !is.na(latitude), !is.na(longitude),
    # remove duplicates
    !duplicated(latitude),
         # remove localities that are only mapped to a Google Earth municipality. The reserve in the "GPS of the reserve" locality is barely a km, so that is acceptable error relative to the environmental resolution
         source_lat_long != "Google Earth municipality") %>%
  # there are some lat/longs with spaces. Need to fix this
  mutate(longitude = str_remove(longitude, "\\s"), 
         latitude = str_remove(latitude, "\\s")) %>% 
  # convert to numeric
  mutate(longitude = as.numeric(longitude),
         latitude = as.numeric(latitude)) %>% 
  # convert to sf for spatial operations
  st_as_sf(coords = c("longitude", "latitude"),
           crs = 4326,
           remove = FALSE)
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_character(),
##   `ng/ul` = col_double(),
##   Longitude = col_double(),
##   Reads = col_double()
## )
## ℹ Use `spec()` for the full column specifications.
# plot localities
plot(st_geometry(af))
plot(st_geometry(locs_per), add = TRUE)

Spatial thin. I’m using a 5 km buffer

set.seed(111733)

#run spthin algorithm. This returns 100 possible combinations of removed localities
output <-
  spThin::thin(
    locs_per,
    'latitude',
    'longitude',
    'species',
    thin.par = 5,
    reps = 100,
    locs.thinned.list.return = TRUE,
    write.files = FALSE,
    verbose = FALSE
  )

# I want to maximize the # of localities returned  
maxThin <- which(sapply(output, nrow) == max(sapply(output, nrow)))

# if there are multiple iterations with the max # of localities, pick one
maxThin <-
  output[[ifelse(length(maxThin) > 1, maxThin[1], maxThin)]]

# subset locs to match only thinned locs
locs_per <- locs_per[as.numeric(rownames(maxThin)), ]

# get the unused locs as a testing data set
# this probably isn't useful since they will overlap heavily with the training data, but seems like another piece of info to look at
test_per <- locs_per[-as.numeric(rownames(maxThin)), ]

plot(st_geometry(af))
plot(st_geometry(locs_per), add=TRUE)

Write thinned localities to file

# Write to file
st_write(locs_per, here("analysis", "output", "thinned_localities", "perditus_thinned.geojson"),
         delete_dsn = TRUE)

Crop environment

First, I need to read in the environmental data.

bioclims <- terra::rast(here("analysis", "output", "cropped_predictors", "bioclims.tif"))
forest_cover <- terra::rast(here("analysis", "output", "cropped_predictors", "forest_cover.tif"))

Cropping and combining variables for analysis.

mcp_per <- st_convex_hull(st_union(locs_per)) %>%
  st_buffer(dist = 0.5) %>% 
  terra::vect()

bioclims_per <- bioclims %>%
  # the bioclims are in a slightly different CRS
  terra::project(forest_cover) %>% 
  terra::crop(mcp_per) %>% 
  terra::mask(mcp_per)

forest_cover_per <- forest_cover %>% 
  terra::crop(bioclims_per) %>% 
  terra::mask(bioclims_per[[1:3]]) 

plot(bioclims_per[[1]])
plot(st_geometry(locs_per), add = TRUE)

plot(forest_cover_per[[1]])
plot(st_geometry(locs_per), add = TRUE)

predictors_per <- c(bioclims_per, forest_cover_per)

Predictor correlation

Sample 10000 background points.

set.seed(86784)

# for variable correlations
bg_envt_per <- terra::spatSample(predictors_per, 10000, 
                               warn=TRUE, 
                               na.rm = TRUE, 
                               as.df = TRUE,
                               xy = TRUE)

# for use in ENMevaluate
bg_coords_per <- bg_envt_per[,c("x", "y")]

Next, I’ll extract the values for the localities and calculate the correlations among variables.

# extract values
bg_corr_per <- bg_envt_per %>% select(-x, -y)

# correlation dataframe
corr_df_per <- corrr::correlate(bg_corr_per, method = "pearson") 
## 
## Correlation method: 'pearson'
## Missing treated using: 'pairwise.complete.obs'
# only retain correlations >= 0.75 for visualizing and deciding which to remove
corr_df_75_per <- corr_df_per %>% 
  # replace NAs first
  mutate(across(everything(), 
                ~replace_na(., 0)),
         # take absolute value of all values to make filtering easier. I'm not interested in the direction 
         across(where(is.numeric), abs)) %>% 
  # only keep values 
  mutate(across(everything(),
                ~ ifelse(. < 0.75, 0, .)))

corrr::rplot(corr_df_75_per)
## Don't know how to automatically pick scale for object of type noquote. Defaulting to continuous.

I’m prioritizing measures of climate extremes and environmental variability over averages to incorporate information about ecological limits rather than averages. In terms of bioclims, this comes out to BIO4 (temperature seasonality), BIO5 (max temp of the warmest month), BIO6 (min temp of the coldest month), BIO7 (annual temp range), BIO13 (prec of the wettest month), BIO14 (prec of the driest month), BIO15 (prec seasonality).
None of the forest cover variables are correlated with anything else.

Variable: what the variable is correlated with.
BIO4: Nothing
X BIO5: BIO1, BIO6, BIO8, BIO9, BIO10, BIO11
BIO6: BIO1, BIO5, BIO8, BIO9, BIO10, BIO11
BIO7: BIO2 BIO13: BIO16
BIO14: BIO15, BIO17, BIO19
X BIO15: BIO14, BIO17, BIO19

BIO5 is correlated with BIO6 and BIO6 (min temp coldest month) is more likely to limit the distribution of lizards in general, so I’m omitting BIO5 from consideration.
BIO15 (prec seasonality) is a tough variable to model and it’s correlated with BIO14, so I’m omitting it from consideration.

The final variable list: BIO4, BIO6, BIO7, BIO13, BIO14, forest cover (3 vars)

predictors_per <- predictors_per[[c("bio_04", "bio_06", "bio_07", "bio_13", "bio_14",  "fc_02_evergreen_broadleaf", "fc_03_deciduous_broadleaf", "fc_04_mixed")]]

Maxent model

I’m using a jackknife model evaluation approach since I only have 17 observations.

set.seed(4567717)
coords_per <- st_coordinates(locs_per)
colnames(coords_per) <- c("x", "y")

folds_per <- ENMeval::get.jackknife(occ = coords_per, 
                            bg = bg_coords_per)

Run the model. Predictions are clamped to prevent extrapolation.

set.seed(97697)

# the vector of regularization multipliers to test
rms <- seq(0.5, 5, 0.5)

# convert the terra object to a raster stack for use in EMNeval
predictors_per_stack <- raster::stack(predictors_per)

# iterate model building over all chosen parameter settings
sdm_per <-
  ENMeval::ENMevaluate(
    occs = coords_per,
    envs = predictors_per_stack,
    bg.coords = bg_coords_per,
    RMvalues = rms,
    fc = c('L', 'LQ', 'H', 'LQH'),
    # clamping to prevent model extrapolation
    doClamp = TRUE,
    taxon.name = "perditus",
    partitions = "user",
    # going to do a separate run with the final model on testing data
    #occs.testing = test_per,
    user.grp = folds_per,
    clamp = TRUE,
    algorithm = "maxent.jar",
    parallel = TRUE,
    numCores = 6
  )
# write the model to file
write_rds(sdm_per, here("analysis", "output", "sdm_models", "sdm_perditus.rds"))
Model evaluation

Let’s take a look at the model results.

eval_table_per <- sdm_per@results
eval_mods_per <- sdm_per@models

names(eval_mods_per) <-
  str_replace_all(names(eval_mods_per), "\\.", "\\_")

Select the final model. First I’m looking at plots of model evaluation stats to get an idea of the spread of stats.

daic_per <- ggplot(data = eval_table_per, aes(x = rm, y = delta.AICc, color = fc)) +
  geom_point() +
  scale_color_viridis_d() +
  theme_bw()

or_per <- ggplot(data = eval_table_per, aes(x = rm, y = or.10p.avg, color = fc)) +
  geom_point() +
  scale_color_viridis_d() +
  theme_bw()

dauc_per <- ggplot(data = eval_table_per, aes(x = rm, y = auc.diff.avg, color = fc)) +
  geom_point() +
  scale_color_viridis_d() +
  theme_bw()

cbi_per <- ggplot(data = eval_table_per, aes(x = rm, y = cbi.val.avg, color = fc)) +
  geom_point() +
  scale_color_viridis_d() +
  theme_bw()

(daic_per + or_per) / (dauc_per + cbi_per)
## Warning: Removed 9 rows containing missing values (geom_point).
## Warning: Removed 36 rows containing missing values (geom_point).

Now I’m going to take a look at tables of delta AICc, omission rate, and AUC to see how close the models are. The model with the lowest AICc has a relatively high omission error rate (0.24), so I chose the model with a lower OR (0.12) that is only 1.15 AICc away (fc LQ and rm 3.5). It is also a simpler model with a 3.5 regularization multiplier, which is nice.

eval_table_per %>% 
  select(delta.AICc, AICc, or.10p.avg, auc.diff.avg, auc.val.avg, rm, fc) %>%
  arrange(delta.AICc) %>% 
  head(10) %>% 
  knitr::kable()
delta.AICc AICc or.10p.avg auc.diff.avg auc.val.avg rm fc
0.000000 399.7811 0.2352941 0.1590243 0.8509294 1.5 LQ
1.145829 400.9270 0.1176471 0.1626583 0.8432147 3.5 LQ
1.887445 401.6686 0.1176471 0.1587989 0.8463735 4 LQ
1.952217 401.7334 0.1764706 0.1670697 0.8342529 2.5 LQ
2.718898 402.5000 0.1764706 0.1639746 0.8432941 2 LQ
2.743563 402.5247 0.1176471 0.1600559 0.8453382 4.5 LQ
2.942407 402.7236 0.1764706 0.1681412 0.8385588 1 L
2.969725 402.7509 0.2352941 0.1568537 0.8510000 1 LQ
3.036275 402.8174 0.1176471 0.1679704 0.8376206 2.5 L
3.722699 403.5038 0.1176471 0.1602868 0.8447382 5 LQ

Select model.

mod_per <- eval_mods_per$rm_3_5_fc_LQ
opt_seq_tune_per <- eval_table_per$tune.args[eval_table_per$tune.args == "rm.3.5_fc.LQ"]

Variable importance. It looks like forests are important again!

plot(mod_per)

Plot the response curves. fc_02 is positively related with presence, while fc_04 is negatively related

dismo::response(mod_per)

Project

I’m projecting the model to the study area extent.

pred_per <- ENMeval::eval.predictions(sdm_per)[[opt_seq_tune_per]]
plot(pred_per)
plot(st_geometry(af), add = TRUE)
plot(st_geometry(locs_per), pch = 21, bg = alpha("lightgray", 0.5), add = TRUE)

E. iheringii

Reading in data for plotting.

# atlantic forest shapefile
af <- read_sf(here("analysis", "data", "atlantic_forest", "atlantic_forest.geojson"))

Spatial thin

Read and filter localities.

locs_ihe <- read_csv(here("analysis", "data", "enyalius_locs.csv")) %>% 
  # the variable names are messy
  janitor::clean_names() %>%
  filter(species == "iheringii",
    !is.na(latitude), !is.na(longitude),
    # remove duplicates
    !duplicated(latitude),
         # remove localities that are only mapped to a Google Earth municipality. The reserve in the "GPS of the reserve" locality is barely a km, so that is acceptable error relative to the environmental resolution
         source_lat_long != "Google Earth municipality") %>%
  # there are some lat/longs with spaces. Need to fix this
  mutate(longitude = str_remove(longitude, "\\s"), 
         latitude = str_remove(latitude, "\\s")) %>% 
  # convert to numeric
  mutate(longitude = as.numeric(longitude),
         latitude = as.numeric(latitude)) %>% 
  # convert to sf for spatial operations
  st_as_sf(coords = c("longitude", "latitude"),
           crs = 4326,
           remove = FALSE)
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_character(),
##   `ng/ul` = col_double(),
##   Longitude = col_double(),
##   Reads = col_double()
## )
## ℹ Use `spec()` for the full column specifications.
# plot localities
plot(st_geometry(af))
plot(st_geometry(locs_ihe), add = TRUE)

Spatial thin. I’m using a 5 km buffer

set.seed(1333333)

#run spthin algorithm. This returns 100 possible combinations of removed localities
output <-
  spThin::thin(
    locs_ihe,
    'latitude',
    'longitude',
    'species',
    thin.par = 5,
    reps = 100,
    locs.thinned.list.return = TRUE,
    write.files = FALSE,
    verbose = FALSE
  )

# I want to maximize the # of localities returned  
maxThin <- which(sapply(output, nrow) == max(sapply(output, nrow)))

# if there are multiple iterations with the max # of localities, pick one
maxThin <-
  output[[ifelse(length(maxThin) > 1, maxThin[1], maxThin)]]

# subset locs to match only thinned locs
locs_ihe <- locs_ihe[as.numeric(rownames(maxThin)), ]

# get the unused locs as a testing data set
# this probably isn't useful since they will overlap heavily with the training data, but seems like another piece of info to look at
test_ihe <- locs_ihe[-as.numeric(rownames(maxThin)), ]

plot(st_geometry(af))
plot(st_geometry(locs_ihe), add=TRUE)

Write thinned localities to file

# Write to file
st_write(locs_ihe, here("analysis", "output", "thinned_localities", "perditus_thinned.geojson"),
         delete_dsn = TRUE)

Crop environment

First, I need to read in the environmental data.

bioclims <- terra::rast(here("analysis", "output", "cropped_predictors", "bioclims.tif"))
forest_cover <- terra::rast(here("analysis", "output", "cropped_predictors", "forest_cover.tif"))

Cropping and combining variables for analysis.

mcp_ihe <- st_convex_hull(st_union(locs_ihe)) %>%
  st_buffer(dist = 0.5) %>% 
  terra::vect()

bioclims_ihe <- bioclims %>%
  # the bioclims are in a slightly different CRS
  terra::project(forest_cover) %>% 
  terra::crop(mcp_ihe) %>% 
  terra::mask(mcp_ihe)

forest_cover_ihe <- forest_cover %>% 
  terra::crop(bioclims_ihe) %>% 
  terra::mask(bioclims_ihe[[1:3]]) 

plot(bioclims_ihe[[1]])
plot(st_geometry(locs_ihe), add = TRUE)

plot(forest_cover_ihe[[1]])
plot(st_geometry(locs_ihe), add = TRUE)

predictors_ihe <- c(bioclims_ihe, forest_cover_ihe)

Predictor correlation

Sample 10000 background points.

set.seed(7457684)

# for variable correlations
bg_envt_ihe <- terra::spatSample(predictors_ihe, 10000, 
                               warn=TRUE, 
                               na.rm = TRUE, 
                               as.df = TRUE,
                               xy = TRUE)

# for use in ENMevaluate
bg_coords_ihe <- bg_envt_ihe[,c("x", "y")]

Next, I’ll extract the values for the localities and calculate the correlations among variables.

# extract values
bg_corr_ihe <- bg_envt_ihe %>% select(-x, -y)

# correlation dataframe
corr_df_ihe <- corrr::correlate(bg_corr_ihe, method = "pearson") 
## 
## Correlation method: 'pearson'
## Missing treated using: 'pairwise.complete.obs'
# only retain correlations >= 0.75 for visualizing and deciding which to remove
corr_df_75_ihe <- corr_df_ihe %>% 
  # replace NAs first
  mutate(across(everything(), 
                ~replace_na(., 0)),
         # take absolute value of all values to make filtering easier. I'm not interested in the direction 
         across(where(is.numeric), abs)) %>% 
  # only keep values 
  mutate(across(everything(),
                ~ ifelse(. < 0.75, 0, .)))

corrr::rplot(corr_df_75_ihe)
## Don't know how to automatically pick scale for object of type noquote. Defaulting to continuous.

I’m prioritizing measures of climate extremes and environmental variability over averages to incorporate information about ecological limits rather than averages. In terms of bioclims, this comes out to BIO4 (temperature seasonality), BIO5 (max temp of the warmest month), BIO6 (min temp of the coldest month), BIO7 (annual temp range), BIO13 (prec of the wettest month), BIO14 (prec of the driest month), BIO15 (prec seasonality).
None of the forest cover variables are correlated with anything else.

Variable: what the variable is correlated with.
X BIO4: BIO3, BIO14, BIO15, BIO17, BIO19 X BIO5: BIO1, BIO6, BIO8, BIO9, BIO10, BIO11
BIO6: BIO1, BIO5, BIO8, BIO9, BIO10, BIO11 , BIO13, BIO15 BIO7: Nothing BIO13:BIO1, BIO6, BIO11, BIO15, BIO16, BIO18
BIO14: BIO4, BIO15, BIO17, BIO19
X BIO15: BIO4, BIO6, BIO11, BIO13, BIO14, BIO17, BIO19

BIO4 is correlated with BIO14 and BIO15 and is a seasonality variable, which may not be as precise or reliable as a min/max value, so I’m omitting it.
BIO5 is correlated with BIO6 and BIO6 (min temp coldest month) is more likely to limit the distribution of lizards in general, so I’m omitting BIO5 from consideration.
Even though the correlation between BIO6 and BIO13 is slightly above the cutoff (r = 0.759), I’m keeping both because they are ecologically important variables and the proximity to the cutoff isn’t too distant.
BIO15 (prec seasonality) is a tough variable to model and it’s correlated with four other variables, so I’m omitting it from consideration.

The final variable list: BIO6, BIO7, BIO13, BIO14, forest cover (3 vars)

predictors_ihe <- predictors_ihe[[c("bio_06", "bio_07", "bio_13", "bio_14",  "fc_02_evergreen_broadleaf", "fc_03_deciduous_broadleaf", "fc_04_mixed")]]

Maxent model

I’m using a jackknife model evaluation approach since I only have 25 observations.

set.seed(34444437)
coords_ihe <- st_coordinates(locs_ihe)
colnames(coords_ihe) <- c("x", "y")

folds_ihe <- ENMeval::get.jackknife(occ = coords_ihe, 
                            bg = bg_coords_ihe)

Run the model. Predictions are clamped to prevent extrapolation.

set.seed(2699997)

# the vector of regularization multipliers to test
rms <- seq(0.5, 5, 0.5)

# convert the terra object to a raster stack for use in EMNeval
predictors_ihe_stack <- raster::stack(predictors_ihe)

# iterate model building over all chosen parameter settings
sdm_ihe <-
  ENMeval::ENMevaluate(
    occs = coords_ihe,
    envs = predictors_ihe_stack,
    bg.coords = bg_coords_ihe,
    RMvalues = rms,
    fc = c('L', 'LQ', 'H', 'LQH'),
    # clamping to prevent model extrapolation
    doClamp = TRUE,
    taxon.name = "iheringii",
    partitions = "user",
    # going to do a separate run with the final model on testing data
    #occs.testing = test_ihe,
    user.grp = folds_ihe,
    clamp = TRUE,
    algorithm = "maxent.jar",
    parallel = TRUE,
    numCores = 6
  )
# write the model to file
write_rds(sdm_ihe, here("analysis", "output", "sdm_models", "sdm_iheringii.rds"))
Model evaluation

Let’s take a look at the model results.

eval_table_ihe <- sdm_ihe@results
eval_mods_ihe <- sdm_ihe@models

names(eval_mods_ihe) <-
  str_replace_all(names(eval_mods_ihe), "\\.", "\\_")

Select the final model. First I’m looking at plots of model evaluation stats to get an idea of the spread of stats.

daic_ihe <- ggplot(data = eval_table_ihe, aes(x = rm, y = delta.AICc, color = fc)) +
  geom_point() +
  scale_color_viridis_d() +
  theme_bw()

or_ihe <- ggplot(data = eval_table_ihe, aes(x = rm, y = or.10p.avg, color = fc)) +
  geom_point() +
  scale_color_viridis_d() +
  theme_bw()

dauc_ihe <- ggplot(data = eval_table_ihe, aes(x = rm, y = auc.diff.avg, color = fc)) +
  geom_point() +
  scale_color_viridis_d() +
  theme_bw()

cbi_ihe <- ggplot(data = eval_table_ihe, aes(x = rm, y = cbi.val.avg, color = fc)) +
  geom_point() +
  scale_color_viridis_d() +
  theme_bw()

(daic_ihe + or_ihe) / (dauc_ihe + cbi_ihe)
## Warning: Removed 9 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).

Now I’m going to take a look at tables of delta AICc, omission rate, and AUC to see how close the models are. All of the top models have the same omission rate, similar AUC, and are either LQ or L. I chose the simplest model within 2 AICc of the top model, which was the rm 2.5 and linear feature class model. The top was a rm 2.5 linear quadratic, but even marginally simpler is better within reasonable limits. After skimming through them, they are all very similar models.

eval_table_ihe %>% 
  select(delta.AICc, AICc, or.10p.avg, auc.diff.avg, auc.val.avg, rm, fc) %>%
  arrange(delta.AICc) %>% 
  head(10) %>% 
  knitr::kable()
delta.AICc AICc or.10p.avg auc.diff.avg auc.val.avg rm fc
0.0000000 573.5871 0.12 0.1449758 0.780908 2.5 LQ
0.8621843 574.4493 0.12 0.1442875 0.778476 3 LQ
1.0877604 574.6749 0.12 0.1469163 0.776960 2.5 L
1.8720880 575.4592 0.12 0.1433178 0.775012 3.5 LQ
2.0589366 575.6460 0.12 0.1475302 0.784392 1 L
2.0741377 575.6612 0.12 0.1461770 0.783924 1.5 LQ
2.2319478 575.8190 0.12 0.1464295 0.772632 3 L
2.4313093 576.0184 0.12 0.1431907 0.771308 5 LQ
2.7105533 576.2976 0.12 0.1473680 0.782280 1.5 L
2.7292762 576.3164 0.12 0.1457163 0.782756 2 LQ

Select model.

mod_ihe <- eval_mods_ihe$rm_2_5_fc_L
opt_seq_tune_ihe <- eval_table_ihe$tune.args[eval_table_ihe$tune.args == "rm.2.5_fc.L"]

Variable importance. It looks like forests are important again!

plot(mod_ihe)

Plot the response curves. BIO6 is positively related with presence, BIO7 is negatively related, BIO14 is negatively related, fc_02 is positively related, and fc_03 is positively related. It’s interesting that BIO14, precipitation of the driest month, is negatively related. I would expect that the lizards don’t want it to get as dry, but maybe I’m missing something about their natural history. I’ll have to go back an reread the literature I have on them.

dismo::response(mod_ihe)

Project

I’m projecting the model to the study area extent. It’s not predicting strong suitability for the southern localities. I’ll have to look at the genetic structure and locality info more closely to see what’s up with them.

pred_ihe <- ENMeval::eval.predictions(sdm_ihe)[[opt_seq_tune_ihe]]
plot(pred_ihe)
plot(st_geometry(af), add = TRUE)
plot(st_geometry(locs_ihe), pch = 21, bg = alpha("lightgray", 0.5), add = TRUE)